Borsod-Abaúj-Zemplén County
Capsule-ConvKAN: A Hybrid Neural Approach to Medical Image Classification
Pituková, Laura, Sinčák, Peter, Kovács, László József, Wang, Peng
This study conducts a comprehensive comparison of four neural network architectures: Convolutional Neural Network, Capsule Network, Convolutional Kolmogorov-Arnold Network, and the newly proposed Capsule-Convolutional Kolmogorov-Arnold Network. The proposed Capsule-ConvKAN architecture combines the dynamic routing and spatial hierarchy capabilities of Capsule Network with the flexible and interpretable function approximation of Convolutional Kolmogorov-Arnold Networks. This novel hybrid model was developed to improve feature representation and classification accuracy, particularly in challenging real-world biomedical image data. The architectures were evaluated on a histopathological image dataset, where Capsule-ConvKAN achieved the highest classification performance with an accuracy of 91.21%. The results demonstrate the potential of the newly introduced Capsule-ConvKAN in capturing spatial patterns, managing complex features, and addressing the limitations of traditional convolutional models in medical image classification.
- North America > United States > Connecticut (0.04)
- Europe > Hungary > Borsod-Abaúj-Zemplén County > Miskolc (0.04)
- Asia > India (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Research Report (0.85)
- Instructional Material > Online (0.61)
- Instructional Material > Course Syllabus & Notes (0.61)
Rate of Model Collapse in Recursive Training
Suresh, Ananda Theertha, Thangaraj, Andrew, Khandavally, Aditya Nanda Kishore
Given the ease of creating synthetic data from machine learning models, new models can be potentially trained on synthetic data generated by previous models. This recursive training process raises concerns about the long-term impact on model quality. As models are recursively trained on generated data from previous rounds, their ability to capture the nuances of the original human-generated data may degrade. This is often referred to as \emph{model collapse}. In this work, we ask how fast model collapse occurs for some well-studied distribution families under maximum likelihood (ML or near ML) estimation during recursive training. Surprisingly, even for fundamental distributions such as discrete and Gaussian distributions, the exact rate of model collapse is unknown. In this work, we theoretically characterize the rate of collapse in these fundamental settings and complement it with experimental evaluations. Our results show that for discrete distributions, the time to forget a word is approximately linearly dependent on the number of times it occurred in the original corpus, and for Gaussian models, the standard deviation reduces to zero roughly at $n$ iterations, where $n$ is the number of samples at each iteration. Both of these findings imply that model forgetting, at least in these simple distributions under near ML estimation with many samples, takes a long time.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Hungary > Borsod-Abaúj-Zemplén County > Miskolc (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Neural Networks for Vehicle Routing Problem
Abstract: The Vehicle Routing Problem is about optimizing the routes of vehicles to meet the needs of customers at specific locations. The route graph consists of depots on several levels and customer positions. Several optimization methods have been developed over the years, most of which are based on some type of classic heuristic: genetic algorithm, simulated annealing, tabu search, ant colony optimization, firefly algorithm. Recent developments in machine learning provide a new toolset, the rich family of neural networks, for tackling complex problems. The main area of application of neural networks is the area of classification and regression. Route optimization can be viewed as a new challenge for neural networks. The article first presents an analysis of the applicability of neural network tools, then a novel graphical neural network model is presented in detail. The efficiency analysis based on test experiments shows the applicability of the proposed NN architecture.
Navigating Process Mining: A Case study using pm4py
Process-mining techniques have emerged as powerful tools for analyzing event data to gain insights into business processes. In this paper, we present a comprehensive analysis of road traffic fine management processes using the pm4py library in Python. We start by importing an event log dataset and explore its characteristics, including the distribution of activities and process variants. Through filtering and statistical analysis, we uncover key patterns and variations in the process executions. Subsequently, we apply various process-mining algorithms, including the Alpha Miner, Inductive Miner, and Heuristic Miner, to discover process models from the event log data. We visualize the discovered models to understand the workflow structures and dependencies within the process. Additionally, we discuss the strengths and limitations of each mining approach in capturing the underlying process dynamics. Our findings shed light on the efficiency and effectiveness of road traffic fine management processes, providing valuable insights for process optimization and decision-making. This study demonstrates the utility of pm4py in facilitating process mining tasks and its potential for analyzing real-world business processes.
- Europe > Hungary > Borsod-Abaúj-Zemplén County > Miskolc (0.05)
- North America > United States (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (4 more...)
- Workflow (0.67)
- Research Report > New Finding (0.34)
- Materials > Metals & Mining (0.67)
- Education (0.46)
SPOT: Text Source Prediction from Originality Score Thresholding
Yvinec, Edouard, Kasser, Gabriel
The wide acceptance of large language models (LLMs) has unlocked new applications and social risks. Popular countermeasures aim at detecting misinformation, usually involve domain specific models trained to recognize the relevance of any information. Instead of evaluating the validity of the information, we propose to investigate LLM generated text from the perspective of trust. In this study, we define trust as the ability to know if an input text was generated by a LLM or a human. To do so, we design SPOT, an efficient method, that classifies the source of any, standalone, text input based on originality score. This score is derived from the prediction of a given LLM to detect other LLMs. We empirically demonstrate the robustness of the method to the architecture, training data, evaluation data, task and compression of modern LLMs.
- Education (1.00)
- Leisure & Entertainment > Sports > Soccer (0.68)
- Media (0.67)
Large Language Model (LLM) AI text generation detection based on transformer deep learning algorithm
Mo, Yuhong, Qin, Hao, Dong, Yushan, Zhu, Ziyi, Li, Zhenglin
In this paper, a tool for detecting LLM AI text generation is developed based on the Transformer model, aiming to improve the accuracy of AI text generation detection and provide reference for subsequent research. Firstly the text is Unicode normalised, converted to lowercase form, characters other than non-alphabetic characters and punctuation marks are removed by regular expressions, spaces are added around punctuation marks, first and last spaces are removed, consecutive ellipses are replaced with single spaces and the text is connected using the specified delimiter. Next remove non-alphabetic characters and extra whitespace characters, replace multiple consecutive whitespace characters with a single space and again convert to lowercase form. The deep learning model combines layers such as LSTM, Transformer and CNN for text classification or sequence labelling tasks. The training and validation sets show that the model loss decreases from 0.127 to 0.005 and accuracy increases from 94.96 to 99.8, indicating that the model has good detection and classification ability for AI generated text. The test set confusion matrix and accuracy show that the model has 99% prediction accuracy for AI-generated text, with a precision of 0.99, a recall of 1, and an f1 score of 0.99, achieving a very high classification accuracy. Looking forward, it has the prospect of wide application in the field of AI text detection.
- Oceania > Fiji (0.04)
- North America > United States > Texas (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (5 more...)
Graphs Unveiled: Graph Neural Networks and Graph Generation
Embarking on the exploration of machine learning applied to graphs [1] invites us into a realm where graphs, representing connections between objects (nodes), become a universal language for deciphering complex systems [2]. For instance, in a social network graph, individuals are nodes, and friendships are edges. The power of this concept becomes evident in historical studies, like Wayne W. Zachary's analysis of a karate club's dynamics [3], predicting factional splits based on the graph structure. What makes graphs versatile is their ability to represent various interactions, be it in social networks, biology, or even telecommunications. Now, as we step into the world of machine learning, graphs become more than visual representations.
- Research Report (0.50)
- Overview (0.46)
A Link between Coding Theory and Cross-Validation with Applications
Pahikkala, Tapio, Movahedi, Parisa, Montoya, Ileana, Miikonen, Havu, Foldes, Stephan, Airola, Antti, Major, Laszlo
How many different binary classification problems a single learning algorithm can solve on a fixed data with exactly zero or at most a given number of cross-validation errors? While the number in the former case is known to be limited by the no-free-lunch theorem, we show that the exact answers are given by the theory of error detecting codes. As a case study, we focus on the AUC performance measure and leave-pair-out cross-validation (LPOCV), in which every possible pair of data with different class labels is held out at a time. We show that the maximal number of classification problems with fixed class proportion, for which a learning algorithm can achieve zero LPOCV error, equals the maximal number of code words in a constant weight code (CWC), with certain technical properties. We then generalize CWCs by introducing light CWCs, and prove an analogous result for nonzero LPOCV errors and light CWCs. Moreover, we prove both upper and lower bounds on the maximal numbers of code words in light CWCs. Finally, as an immediate practical application, we develop new LPOCV based randomization tests for learning algorithms that generalize the classical Wilcoxon-Mann-Whitney U test.
- Europe > Finland > Southwest Finland > Turku (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Oncology (0.93)
- Health & Medicine > Diagnostic Medicine (0.67)
Presence of informal language, such as emoticons, hashtags, and slang, impact the performance of sentiment analysis models on social media text?
This study aimed to investigate the influence of the presence of informal language, such as emoticons and slang, on the performance of sentiment analysis models applied to social media text. A convolutional neural network (CNN) model was developed and trained on three datasets: a sarcasm dataset, a sentiment dataset, and an emoticon dataset. The model architecture was held constant for all experiments and the model was trained on 80% of the data and tested on 20%. The results revealed that the model achieved an accuracy of 96.47% on the sarcasm dataset, with the lowest accuracy for class 1. On the sentiment dataset, the model achieved an accuracy of 95.28%. The amalgamation of sarcasm and sentiment datasets improved the accuracy of the model to 95.1%, and the addition of emoticon dataset has a slight positive impact on the accuracy of the model to 95.37%. The study suggests that the presence of informal language has a restricted impact on the performance of sentiment analysis models applied to social media text. However, the inclusion of emoticon data to the model can enhance the accuracy slightly.
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.51)
- Information Technology > Services (0.47)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Intransitively winning chess players positions
Positions of chess players in intransitive (rock-paper-scissors) relations are considered. Namely, position A of White is preferable (it should be chosen if choice is possible) to position B of Black, position B of Black is preferable to position C of White, position C of White is preferable to position D of Black, but position D of Black is preferable to position A of White. Intransitivity of winningness of positions of chess players is considered to be a consequence of complexity of the chess environment -- in contrast with simpler games with transitive positions only. The space of relations between winningness of positions of chess players is non-Euclidean. The Zermelo-von Neumann theorem is complemented by statements about possibility vs. impossibility of building pure winning strategies based on the assumption of transitivity of positions of chess players. Questions about the possibility of intransitive positions of players in other positional games are raised.
- Information Technology > Artificial Intelligence (0.48)
- Information Technology > Game Theory (0.46)